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Abstract 

■ Consider a problem of predicting a response variable using a set of covariates in a linear 

CN ■ regression model. If it is a priori known or suspected that a subset of the covariates do not 

pi i' significantly contribute to the overall fit of the model, a restricted model that excludes these 

' covariates, may be sufficient. If, on the other hand, the subset provides useful information, 

shrinkage method combines restricted and unrestricted estimators to obtain the parameter 
estimates. Such an estimator outperforms the classical maximum likelihood estimators. Any 
prior information may be validated through preliminary test (or pretest), and depending 
on the validity, may be incorporated in the model as a parametric restriction. Thus, pretest 
estimator chooses between the restricted and unrestricted estimators depending on the 
outcome of the preliminary test. Examples using three real life data sets are provided 
' to illustrate the application of shrinkage and pretest estimation. Performance of positive- 

c/3 . shrinkage and pretest estimators are compared with unrestricted estimator under varying 

degree of uncertainty of the prior information. Monte Carlo study reconfirms the asymptotic 
properties of the estimators available in the literature. 
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Regression analysis is one of the most mature and widely applied branch in statistics. Least 
squares estimation and related procedures, mostly having a parametric fiavor, have received 
considerable attention from theoretical as well as application perspectives. Statistical mod- 
els, both linear and non-linear, are used to obtain information about unknown parameters. 
I Whether such model fits the data well or whether the estimated parameters are of much use 

depends on the validity of certain assumptions. In this setup, the estimates are obtained 
to have insights about the parameters. However, in many practical situations, it is the re- 
searchers who provide the estimation of the parameters utilizing the information contained 
in the sample and other relevant information. The "other" information may be considered 
as non-sample information (NSI). This is also known as uncertain prior information (UPI), 
or simply prior information. The non-sample information may or may not positively con- 
tribute in the estimation procedure. Nevertheless, it may be advantageous to use the NSI 
in the estimation process when sample-information may be rather limited. 

The quality of the fit and of the estimated parameters depend largely on the quality of 
the data used to obtain them. Only reliable information leads to useful results. However, 
in many practical situations, uncertainty arises as to whether the available information is 
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of much use. It is widely accepted that in apphed science, an experiment is often performed 
with some prior knowledge of the outcomes, or to confirm a hypothetical result, or to 
re-establish existing results. 

With this keeping in mind, it is however, important to note that the consequences of 
incorporating non-sample information depend on the quality or usefulness of the information 
being added in the estimation process. Any uncertain prior information ma,y be tested 
before they are incorporated in the model. Based on the idea of lBancroftI (jl944l ). uncertain 
prior information may be validated through preliminary test, and depending on the validity, 
may be incorporated in as a parametric restriction, and choose between the restricted or 
unrestrict ed estimatio n procedure depending on the outcome of the preliminary test. 

Later, ISteinI (|l956l ) introduced shrinkage estimation. In this framework, the shrinkage 
estimator or Stein-type estimator takes a hybrid approach by shrinking the base estimator 
to a plausible alternative estimator utilizing the non-sample information if it proves to be 
useful. 



1.1 Review of Literature 

Since the beginning, shrinkage estimation have received considerable attention from the re- 
searchers. Since 1987, Ahmed and his co-researchers are among others who have analytically 
demonstrated that shrinkage estimators outshine the classical maximum likelihood estima- 
tor. Asymptotic properties of shrinkage and preliminary test estimators using quadratic 
loss function have been studied, and their dominance over the usual maximum likelihood 



estimators demonstrated in numerous studies in the literature. lAhmedl (119971 ) gave a de- 
tailed description of shrinkage estimation, and discussed large sample estimation techniques 
in a regression model with non-normal errors. 

Khan and Ahmed considered the problem of estimating the coefficient vector of a 

classical regression model, and demonstrated analytically and numerically that the positive- 
part of Stein-type estimator, and the improved preliminary test estimator dominate the 
usual Stein-type, and pretest estimators, respectively. 

Estimation of the mean vector of a multivariate normal distribution, under the uncertain 
prior i nformation that component means are equal but unknown, was studied bv Khan and 



Ahmed" (j2nn(Th . lAhmed and Nicoll toid ) among others, considered various large sample 
estimation techniques in a nonlinear regression model. Nonparametric estimation of the 
location parameter vector whe n uncertain prior informa tion about the regression parameters 
is available was considered by lAhmed and Saleh (Il999h . 

In this paper, we review positive shrinkage, and pretest estimators to compare their 
performance when certain information about a subset of the covariates are available a 
priori. In particular, we apply shrinkage estimation on three real life data sets to show the 
usability of positive-shrinkage and pretest estimators for practical purposes. 



2 Statement of the Problem 

Consider a regression model of the form 

Y = Xp + e, (2.1) 

where Y = {yi,y2, ■ ■ ■ ,yn)' is a vector of responses, X is an n x p fixed design matrix, 
(3 = (/3i, . . . , /3p)' is an unknown parameter vector and e = (ei, £2, • • • , £n)' is the vector of 
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unobservable random errors, and the superscript (') denotes the transpose of a vector or 
matrix. 

We do not make any distributional assumption for the errors, only that es have a 
cumulative distribution function F{e) with E{e) = 60, and E{ee') = cr'^I, where is 
finite. We make the following two assumptions, also called the regularity conditions 

i) max x[{X' X)~^Xi — > as n — > oo, where x'- is the ith row of X 

l<i<n 

f X' x\ 

ii) lim I I = Cn, where C„ is a finite positive-definite matrix. 

rn-oo \ n J 

In our case, suppose that /3 may be partitioned as /3 = (/3^,/32)'. The sub- vectors /3i 
and /32 are assumed to have dimensions pi and p2 respectively, and Pi + P2 = V-, Vi ^ ^ 
for i = 1,2. Here, [3\ is the coefficient vector for main effects, and [32 is a vector for 
"nuisance" effects. We are essentially interested in the estimation of /3i when it is plausible 
that /32 do not contribute significantly in predicting the response. Such a situation may 
arise when there is over-modeling and one wishes to cut down the irrelevant part from 
the model ()2.ip . For example, in studying the relationship between the level of prostate 
specific antigen (PSA) and some clinical measures, the log cancer volume and log prostate 
weight can be considered as the main effects while age, log of benign prostate hyperplasia 
amount, seminal vesicle invasion and others can be regarded as nuisance variables. In this 
situation, inference about /3i may benefit from shrinking the regression coefficients of the 
full model towards the restricted space while utilizing the available information contained 
in the nuisance covariates. Thus, the parameter space can be partitioned, and it is plausible 
that ^2 is near some specified /Jg, which, without loss of generality, may be set to a null 
vector. The prior information about the subset of j3 can be written in terms of a restriction, 
H[3 = h. Here, ii" is a known P2 x p matrix and h is p2 x 1 vector of known constants. 

2.1 Organization of the Paper 

The paper is organized as follows. The statistical model is introduced in section 3. Shrink- 
age, positive-shrinkage, and pretest estimators are defined in this section. Examples using 
three real life data sets are presented in section 4. Positive-shrinkage and pretest estima- 
tors are obtained, and their performance are compared using cross-validation. Monte Carlo 
simulation study is described in section 5. Asymptotic bias and risk expressions for the 
shrinkage estimators are presented in section 6. Finally, conclusions and future directions 
are presented in section 7. 

3 The Model and Estimation Strategies 

The least-squares estimator of (3 is given by 

/3UR = (x'Xy^X'Y = c-^x'Y, 
where C = (X'X). Under the restriction H(3 = h, the restricted estimator is given by 

^ ^UR _ c-^h'{hc-^h')-Hhp^^ - h), 
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which is a hnear function of the unrestricted estimator. Let us define the estimator of cr^ 

by 

2_{Y- Xp^^YiY - X/3UR) 

n — p 

We may consider testing the restriction in the form of testing the null hypothesis 

Hq:H(3 = h. 

The test statistic is defined by 

(H0'^^ -hy(HC-^H')-HH0^^ -h) , , 

= 2 ' (^•^) 

which, under Kq^ follows a chi-square distribution with -p^ degrees of freedom. 
3.1 Shrinkage Estimator 

A Stein-type estimator (STE) of /3i can be defined as 

/3f = /3f + (/3r - /9f ) {1 - ^^^n^} , where K = p2-2, p2>3. 



where ipn is defined in p.ip . 

One problem with STE is that its components may have a different sign from the 
coordinates of This could happen if {p2 — ^ is larger than unity. One possibility is 
when P2 = 2 and tpn < ^- From the practical point of view, the change of sign would affect 
its interpretability. However, this behavior does not adversely affect the risk performance 
of STE. To overcome the sign problem, we define a positive-rule Stein-type semiparametric 
estimator (PSTE) by retaining the positive-part of the STE. A PSTE has the form 

= 0f + 0r - 0f) {I - ^^n'}\ P2>S 

where = max(0, z). Alternatively, this can be written as 



Ahmedl (|200ll ) and others studied the asymptotic properties of Stein-type estimators in 



various contexts. 

3.2 Preliminary Test Estimator 

The preliminary test estimator or pretest estimator for the regression parameter (3i is 
obtained as 

/3PT ^ ^UR _ (^UR _ ^R)^(^^ < (3 2) 

where /(•) is an indicator function, and Cn^a is the upper 100(1 — a) percentage point of the 
test statistic ipn- 

In a pretest estimation problem, the prior information is tested before choosing the esti- 
mator for practical purposes, while shrinkage and positive-shrinkage estimator incorporates 
in the estimation process whatever prior information is available. 
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Pretest estimator either accepts of rejects the restricted estimator {Pf') based on whether 
V'n < Cn^a, while shrinkage estimator is a smoothed version of the pretest estimator. 

4 Examples 

In the following, we study three real life examples. For each data set, we fit linear regression 
models to predict the variable of interest form the available regressors. Shrinkage and pretest 
estimates are then obtained for the regression parameters. Performance of shrinkage and 
pretest estimators are assessed as per the criteria outlined in the following section. 

4.1 Assessment Criteria 

In shrinkage and pretest estimation, we utilize the full-model and sub-model estimates, 
and combine them in a way that shrinks the least-squares estimates towards the sub-model 
estimates. In this framework, we utilize, if available, the information contained in the 
restricted subspace if they contribute significantly in predicting the response. However, in 
the absence of prior information about the nuisance subset, one might do usual variable 
selection to filter the nuisance subset out of the covariates. In that, one initiates the process 
with the model having all the covariates. Then the best subset may be selected based on 
AIC, BIC or other model selection criteria. Separate estimates from full- and restricted 
models are then combined to obtain shrinkage estimates. Finally, a model with shrunken 
coefficients is obtained, which reduces overall prediction error. 

We obtain pretest and positive-shrinkage estimates using different sub-models. Perfor- 
mance of each pair of full- and sub-models was evaluated by estimating the prediction error 
based on K-fold cross validation. In a cross validation, the data set is randomly divided 
into K subsets of roughly equal size. One subset is left aside, and termed as test data, 
while the remaining K — 1 subsets, called training set, are used to fit the model. The fitted 
model is then used to predict the responses of the test data set. Finally, prediction errors 
are obtained by taking the squared deviation of the observed and predicted values in the 
test set. 

We consider K = 5, 10. Both raw cross validation estimate (CVE), and bias corrected 
cross validation estimate of prediction errors are obtained for each configuration. The bias 
corrected cross validation estimate is the adjusted cross-validation estimate designed to 
compensate for th e bias introduced bv not using leave-one-out cross-validation f Tibshirani 



and Tibshirani,i2009|). 

Since cross validation is a random process, the estimated prediction error varies across 
runs, and for different values of K. To account for the random variation, we repeat the 
cross validation process 5000 times, and estimate the average prediction errors along with 
their standard errors. The number of repetitions was initially varied, and settled with this 
as no noticeable variations in the standard errors were observed for higher values. 

4.2 Prostate Data 



Hastie et al. (120091 ) demonstrated various model selection techniques by fitting linear re- 
gression model to the prostate data. Specifically, the log of prostate-specific antigen (ipsa) 
was modeled by the log cancer volume (icavol), log prostate weight (iweight), age (age), 
log benign prostatic hyperplasia amount (ibph), seminal vesicle invasion (svi), log capsular 
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penetration (icp), Gleason score (gleason), and percentage Gleason scores 4 or 5 (pgg45). 
The idea is to predict Ipsa from the measured variables. 

The predictors were first standardized to have zero mean and unit standard deviation 
before fitting the model. Several i nodel selectio i i crite ria and shrinkage methods were tried- 
details of which may be found in lHastie et al. (I2OO9I . Table 3.3, page 63). We consider the 
models obtained by AIC, BIG, and best subset selection (BSS) criteria, and consider them 
as our sub- models. They are listed in Tabled) 



Table 1: Full and candidate sub- models for prostate data. 
Selection 

Criterion Model: Response ~ Covariates 

Full Model lpsa~ Icavol + Iweight + svi + Ibph + age + Icp + 

gleason + pgg45 
AIG Ipsa" Icavol + Iweight + svi + Ibph + age 

BIG lpsa~ Icavol + Iweight + svi 

BSS lpsa~ Icavol + Iweight 

Average prediction errors, and their standard deviations for pretest and shrinkage esti- 
mators for various sub-models are shown in Table [2l Prediction errors are based on five- 
and ten-fold cross validation. Average and standard errors are obtained after repeating the 
process 5000 times. 



Table 2: Average prediction errors for various estimators based on i^-fold cross validation 
repeated 5000 times for prostate data. Numbers in smaller font are the corresponding 
standard errors. 



Estimator 


Raw GVE 


Bias Gorrected GVE 


K = b 


K = 10 


K = f> 


K = 10 


UR 


•556.030 


•548.018 


•543.026 


• 542.017 


R(AIG) 


•535.023 


.529.014 


•525.020 


• 523.013 


R(BIG) 


•537.020 


.533.012 


•529.018 


• 529.011 


R(BSS) 


•582.017 


•578.010 


•576.015 


• 576.009 


PS(AIG) 


•554.029 


•547.018 


•540.025 


• 541.017 


PS(BIG) 


•546.026 


•541.016 


•533.023 


• 535.015 


PS(BSS) 


•549.026 


•542.016 


•536.023 


• 536.015 


PT(AIG) 


•536.024 


.529.014 


•526.021 


.525.014 


PT(BIG) 


•538.021 


.533.012 


•529.019 


• 529.011 


PT(BSS) 


•599.030 


•601.024 


•602.036 


•605.029 



Looking at the bias corrected cross validation estimate of the prediction errors, on an 
average, restricted and the pretest estimators based on AIG have the smallest prediction 
errors. This is followed by pretest and the restricted estimators based on BIG. Interestingly, 
average prediction errors based on the sub-model given by BSS is much higher than those 
obtained from the models based on AIG or BIG. For instance, restricted model based on 
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BSS has average prediction error 0.576, and the same for pretest estimator is 0.605. For the 
same sub-model, positive-shrinkage estimator has average prediction error 0.536, which is 
much less than R(BSS), and PT(BSS). Clearly, positive shrinkage estimator is beating the 
restricted and pretest estimators for this sub-model. This is a classic example where utility 
of positive-shrinkage estimator is practically realized. Restricted and/or pretest estimation 
may perform better under correct specification of the model (e.g., the models given by AIC 
and BIC for this data set), whereas, positive-shrinkage estimator is less sensitive to model 
misspecification. 

Apparently, in the presence of imprecise subspace information, restricted and pretest 
estimators fail to produce the best estimates that reduce average prediction errors. On the 
other hand, positive-shrinkage estimator maintains a steady risk-superiority under model 
misspecification. This behaviour is illustrated in more detail through a Monte Carlo study 
in section [5l 



4.3 State Data 



Faraway (|2002[ l illustrated variable selection methods on a data set called state. There are 



97 observations (cases) on 9 variables. The variables are: population estimate as of July 
1, 1975; per capita income (1974); illiteracy (1970, percent of population); life expectancy 
in years (1969-71); murder and non- negligent manslaughter rate per 100,000 population 
(1976); percent high-school graduates (1970); mean number of days with minimum tem- 
perature 32 degrees (1931-1960) in capital or large city; and land area in square miles. We 
consider life expectancy as the response. It was found that population, murder, high school 
graduates, and temperature produce the best model based on AIC or BIC. A model based 
on CP statistic that includes population, high school graduates, and temperature showed 
the largest adjusted B? . All the models are listed in Table El 



Table 3: Full and candidate sub-models for state data. 



Selection 




Criterion 


Model: Response ~ Covariates 


Full 


Life.exp~ Population + Murder + Hs.grad + Frost + 




Income + Illiteracy + Area 


AIC/BIC 


Life.exp~ Population + Murder + Hs.grad + Frost 


CP 


Life.exp~ Murder + Hs.grad + Frost 



When the models are correctly specified, it is obvious that restricted estimator will 
perform the best. Such is the scenario for the state data, where the model given by AIC 
and BIC are the same, and the restricted estimator has the smallest prediction error. Under 
model uncertainty, however, the scenario will change completely as restricted estimator 
becomes unbounded when the sub-model deviates from the true structure. This is explored 
in the simulation study presented in section [5j For the correctly specified models, such as in 
Table [U we see that restricted and pretest estimators have the smallest average prediction 
errors for both five-fold and ten-fold cross validation. The bias corrected version of the cross 
validation errors are exactly the same for the restricted and pretest estimators. 

4.4 Galapagos Data 
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Table 4 
cross va 
corres 



Average prediction errors (thousands) for various estimators based on K-fold 
lidation, repeated 5000 times for state data. Numbers in smaller font are the 
ing standard errors. 



pcnd 



Raw CVE 



Bias Corrected CVE 



Estimator K 



K 



10 



K 



K = 10 



UR 

R(AIC) 
R(CP) 

PS(AIC) 
PS(CP) 

PT(AIC) 
PT(CP) 



.879 



144 



.847 



0il6 



•637.063 
•639.058 



•614.0, 
•639.0 



.740.124 •690.ot 
• 768.106 •746.0*3 



•637.066 
•662.069 



•614.0 
•639.0 



.819 



119 



.820 



079 



•599.052 
.626.048 

•696.104 
• 727.090 

•599.054 
•629.059 



• 597.033 
•626.031 

•671.068 

•727.058 

• 597.033 
•626.032 



Faraway^ioO^) analyzed the data about species diversity on the Galapagos islands. The 
Galapagos data contains 30 rows and seven variables. Each row represents an island, and 
the covariates represent various geographic measurements. The relationship between the 
number of species of tortoise and several geographic variables is of interest. The data set 
has the following covariates: Species represents the number of species of tortoise found on 
the island, Endemics represents the number of endemic species. Area represents the area of 
the island (km^). Elevation measures the highest elevation of the island (m), Nearest is 
the distance from the nearest island (km), Scruz measures the distance from Santa Cruz 
island (km). Adjacent measures the area of the adjacent island (km^). The origina l data set 
contai ned missing values for some of the covariates, which have been imputed by iFarawav 
( 2OO2I ) for convenience. 

The full model and the sub-models based on AIC and BIG are shown in Table [5j 



Table 5: Full and candidate sub-models for Galapagos data. 



Selection 
Criterion 


Model: Response ~ Covariates 




FuU 


Species~ Endemics -|- Area -|- Elevation - 


- Nearest -|- 




Scruz -|- Adjacent 




AIC 


Species~ Endemics + Area + Elevation 




BIG 


Species~ Endemics 





We obtain restricted, pretest, and positive-shrinkage estimates of the regression param- 
eters of the Galapagos data. Average prediction errors along with their standard errors for 
unrestricted (UR), restricted (R), positive-shrinkage (PS), and preliminary test or pretest 
(PT) estimators are presented in Table [H Prediction errors and the standard errors are 
shown in thousands. PS(AIC) represents positive shrinkage estimates based on sub-model 
given by AIC, and PS(BIC) represents the same based on BIG. PT(AIC) and PT(BIC) are 
similarly defined for pretest estimators. 

For this example as well, since we have selected our sub- models based on AIC or BIG, 
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Table 6: Average prediction errors (thousands) for various estimators based on K-fold 
cross validation, repeated 5000 times for Galapagos data. Numbers in smaller font are the 
corresponding standard errors. 





Raw CVE 


Bias Corrected CVE 


Estimator 


K = 5 


K = W 


K = 5 


K = 10 


UR 


13.878.36 


12.634.36 


ll.3l6.70 


11.483.93 


R(AIC) 
R(BIC) 


12.456.96 
1 -780.59 


11.624.28 
1.65o.24 


10.105.57 

1 -460.43 


10.533.85 
l-51o.29 


PS(AIC) 
PS(BIC) 


13.197.82 
9.076.53 


11.984.29 
7.963.75 


10.756.27 

7.545.24 


10.883.87 

7.323.38 


PT(AIC) 
PT(BIC) 


12.506.98 
5.397.56 


11.634.29 

3.906.16 


10.145.58 
4.406.08 


10.543.86 

3.555.56 



they are likely to be true, which results in restricted and pretest estimators being the best 
estimators in terms of prediction errors. We notice that, models based on BIC are smaller 
in size, and their average prediction errors are smaller than those of the AIC models. The 
difference in average prediction errors for the two sub-models is noticeably large. Such a 
large difference between the competing sub-models shows us about the uncertainty in model 
specification, and the consequences that it cause. Monte Carlo study conducted later in the 
paper (section [5]) reveals the sensitivity of restricted and pretest estimators, and shows that 
pretest and restricted estimators are outperformed by positive-shrinkage estimators when 
the underlying model is misspecified. 

It is noted here that the prediction errors are unusually large for this data set. This 
indicates that the predictors are not quite capturing the variability in the response. 

5 Simulation Studies 

Monte Carlo simulation experiments have been conducted to examine the quadratic risk 
performance of positive-shrinkage and pretest estimators. We simulate the response from 
the following model: 

Vi = xuPi + X2i(32 + . . . , +Xpi(3p + ei, i = l,...,n, 

where XI, = {(^^ f + (^'^ + X2i = (4'^)2 + cf ^ + 26., = (Ci'^)' + Cf ^ with ci'^ i.i.d. ~ 
7V(0, 1), Cf^ i.i.d. ~ A^(0,1), in ~Bernoulh(0.45) and 6i ~Bernouni(0.45) for all s = 
3, . . . ,p and i = 1, . . . ,n. Moreover, e-i are i.i.d. A^(0, 1). 

We are interested in testing the hypothesis -f^o '■ = 0, for j = pi + l,pi + 2, . . . +P2, 
with p = pi + P2- Accordingly, we partition the regression coefficients as /3 = {f3i,j32) = 
(/3i,0). We show results for (3i = (1, 1, 1), and f3i = (1, 1, 1, 1) only. 

The number of simulations were initially varied. Finally, each realization was repeated 
2000 times to obtain stable results. For each realization, we calculated bias of the estimators. 
We defined A = ||/3 - /3W||, where /3(°) = (/3i,0), and || . || is the Euclidean norm. To 
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determine the behavior of the estimators for A > 0, further data sets were generated from 
those distributions under local alternative hypothesis. Various A values between [0,1] have 
been considered. 

The risk performance of an estimator of /3i was measured by comparing its MSE with 
that of the unrestricted estimator as defined below: 



MSE(^ 
MSE01] 



(5.1) 



where 0i is one of the estimators considered in this study. The amount by which an RMSE 

is larger than unity indicates the degree of superiority of the estimator fj^ over 0i^. 

RMSEs for the positive-shrinkage and pretest estimators were computed for n = 30, 50, 100, 
Pi = 3, 6, 9, and p2 = 4, 6, 9. Since the results are similar for all the configurations, we list 
the RMSEs in Table [7| for n = 50. Comparative RMSEs for positive-shrinkage and pretest 
estimators for (pi,P2) = (3, 3), (3, 6), (4, 3), and (4, 6) are illustrated in Figured) 



(a) p, = 3, P2 



(b) p, =3, P2 = 6 





(c) p, =4, P2 = 3 



(d) p, =4, P2 = 6 





0.0 0.2 0.4 0.6 0.8 1.0 



0.0 0.2 0.4 0.6 0.8 1.0 



Figure 1: Relative mean squared error for restricted, positive-shrinkage, and pretest esti- 
mators for n = 50, and (pi,P2) = (3, 3), (3, 6), (4, 3), and (4, 5) 



5.1 Case 1: A = 

Clearly, for A = 0, the restricted estimator outperforms all other estimators for all the 
cases considered in the simulation study. As the restriction moves away from A = 0, the 
restricted estimator becomes unbounded (see the sharply decaying curve that goes below the 
horizontal line at / 0i=l for A > 0). The positive-shrinkage estimator approaches 1 at 
the slowest rate (for a range of A) as we move away from A = 0. This indicates that in the 
event of imprecise subspace information (i.e., even if (32 ^ 0), it has the smallest quadratic 
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Table 7: Simulated relative mean squared error for restricted, positive-shrinkage, and pretest 
estimators with respect to unrestricted estimator for pi = 4, and P2 = 6 for different A 
when n = 50. 



A* 


0^ 


Hi 


H 


u.uu 


3.25 


2.17 


CO 

2.59 


(J.U5 


3.10 


2.06 


00 

2.30 


U.ii 


2.63 


1.83 


1.77 


U.lb 


2.02 


1.57 


1 01 

1.31 


U.21 


l.bO 


1.39 


1 /I 

1.04 


(J. 2b 


1.23 


1.27 


001 

0.91 


(J. 32 


0.98 


1.20 


00 

0.89 


(J. 37 


0.77 


1.15 


00 

0.89 


(J. 42 


0.b3 


1.12 


00 

0.93 


L).47 


0.51 


1 OO 

1.09 


O/^ 

0.96 


U.OO 








0.58 


0.36 


1.06 


0.99 


0.63 


0.31 


1.06 


1.00 


0.68 


0.27 


1.05 


1.00 


0.74 


0.23 


1.04 


1.00 


0.79 


0.20 


1.03 


1.00 


0.84 


0.18 


1.03 


1.00 


0.89 


0.16 


1.02 


1.00 


0.95 


0.15 


1.03 


1.00 


1.00 


0.13 


1.02 


1.00 



risk among all other estimators for a range of A. Pretest estimator outshines shrinkage 
estimators when A is in the neighbourhood of zero. Otherwise, it becomes unbounded 
at a faster rate than the restricted estimator. However, with the increase of A, at some 
point, RMSE of pretest estimator approaches 1 from below. This phenomenon suggests that 
neither pretest nor restricted estimator is uniformly better than the other when A > 0. 

5.2 Case 2: A > 

Simulation results suggest that positive shrinkage estimator maintains its superiority over 
the restricted and pretest estimators for a wide range of A. In particular, when p2 = 3, 
the performance of positive-shrinkage estimator is superior for A up to around 0.35, after 
which point it is as good as the unrestricted estimator (panels a) and c) in Figure [T]). 
However, when p2 = 6, positive-shrinkage estimator maintains its risk-superiority over all 
other estimators for a wider range of A (see panels b) and d) in Figured]). This clearly 
suggests that a positive-shrinkage estimator is preferred as there always remains uncertainty 
in specifying statistical models correctly. Moreover, one cannot go wrong with the positive- 
shrinkage estimators even if the assumed model is grossly wrong. In such cases, the estimates 
are as good or equal to the unrestricted (i.e., full model) estimates. 

In the following sections, we review the asymptotic properties of the estimators, and 
analytically present their bias and risk expressions. 
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6 Asymptotic Distribution of the Estimators 



In this section we present the asymptotic distributions of the estimators, and the test 
statistic i(^ri- This facilitates in finding the asymptotic distributional bias (ADB), asymptotic 
quadratic distributional bias (AQDB), and quadratic risk (AQDR) of the estimator of (3. 

Under fixed alternative, the asymptotic distribution of ^/n{[3* — /3)/se is equivalent to 
y/n{P^^ — f3)/se- This suggest that in asymptotic setup, there is not much to investigate 
under a fixed alternative such as H(3 ^ h. Therefore, to obtain meaningful asymptotics, a 
class of local alternatives, {Kn}, is considered, which is given by 

Kn:Hp = h+^, (6.1) 



where = {ui,U2, • • • , Wpj)' G is a fixed vector. We notice that u = implies HP = h, 
i.e., the fixed alternative is a particular case of (16. ip . In the following, we evaluate the 
performance of each estimators under local alternative. 

For an estimator /3* and a positive-definite matrix W, we define the loss function of the 
form 

L{(3*;f3)=n{f3*-f3yW{f3*-f3). 

These loss functions are generally known as weighted quadratic loss functions, where W is 
the weighting matrix. For W = I, it is the simple squared error loss function. 
The expectation of the loss function 

E[L{P*,/3);W]=R[{P*,/3);W], 

is called the risk function, which can be written as 

, /3); W) = nE[{P* - pywi(3* - P)] 

= ntv[W{E{P* -P){P* -P)'}] 

= tv{WT*), (6.2) 

where T* is the covariance matrix of /3*. 

The performance of the estimators can be evaluated by comparing the risk functions 
with a suitable matrix W. An estimator with a smaller risk is preferred. The estimator P* 
will be called inadmissible if there exists another estimator P^ such that 

R{P^,P)<R{p*,p) y{P,W) (6.3) 

with strict inequality holds for some /3. In such case, we say that the estimator P^ dominates 
P* . If, however, instead of (j6.3p holding for every n, we have 

lim i?(/3°,/3) < lim i?(/3*,/3) V/3, (6.4) 

n^oo n— >oo 

with strict inequality for some /3, then /3* is termed as asymptotically inadmissible estimator 
of p. The expression in (j6.3p is not easy to prove. An alternative is to consider the 
asymptotic distributional quadratic risk (ADQR) for the sequence of local alternative {Kn}. 
Consider the asymptotic cumulative distribution function (cdf ) of y/n{P* — P)/ under 
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{Kn} exists, and defined as 

G{y)= lim PlV^ifl* -f3)/se<y]. 
This is known as the asymptotic distribution function (ADF) of Further let 

r = || ••• J yy'Giy) 

be the dispersion matrix which is obtained from ADF, the ADQR may be defined as 

i?(/3*;/3) =tr(Wr). (6.5) 

An estimator /3* is said to dominate an estimator (3^ asymptoticahy if R{(3*\I3) < 
i?(/3°;/3). Further, p* strictly dominates /3° if R{(3*](3) < R{P^;P) for some {P,W). The 
asymptotic risk may be obtained by replacing T with the limit of the actual dispersion 
matrix of y/n{P* — 0) iri the A DQ R function. However , this may require some extra 
regularity conditions. ISen (|l986l ^. and lSaleh and SenI (|l985h among others, have explained 



this point in various other contexts. 

6.1 Asymptotic Bias and Risk Performance 

To obtain the asymptotic distribution of the proposed estimators, and the test statistic ipn-, 
we consider the following theorem. 

Theorem 6.1. Under the regularity conditions, and if cr"^ < oo, as n ^ oo, 

s-^^""" - (3) i Np{0,C-'). 

6.1.1 Bias Performance 

The asymptotic distributional bias (ADB) of an estimator 6 is defined as 

ADB(d) = lim E \n^{S - /3i)| . 

Theorem 6.2. Under the assumed regularity conditions and theorem above, and under 
{Kn}, the ADB of the estimators are as follows: 

ADB{p^^) = (6.6) 

ADB{^f) = -C-^HB^u (6.7) 

ADB{pP^) = -C-'HB-'6Hp,+2{xl„a\ A) (6.8) 
ADB{0^+) = -C^HB^u [i/p,+2(p2 - 2; A) + {p2 - 2)E {x;,+2(A)} 

+E {xp/+2(A)/(x^,+2(A) >P2- 2)}] (6.9) 

where 

f oo 



poo 

Jo 
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and A) is the cdf of a p-variate normal distribution with mean vector 0, and covariance 
matrix, A. 

The bias expressions for all the estimators are not in the scalar form. We therefore 
take recourse by converting them into the quadratic form. Let us define the asymptotic 
quadratic distributional bias (AQDB) of an estimator d of (3i by 

AQDB{5) = [ADB{5)]'Y.[ADB{5)] 

where 5]^^ = u^C"^ is the dispersion r natrix o f as n — )• oo. 

Using the definition, and following lAhmedl (jl997l ). the asymptotic quadratic distribu- 
tional bias of the various estimators are presented below. 



AQDB(/3UR) = 0, 



A 



AQDB(/3f ) - 
AQDB(/3r) = A{/f,,+2(x^,,„;A)}' 



ADQBiPl 



H 



P2+2 



(p2-2;A) + (p2-2)i?{x;,%(A)} 



+E {xp,%(A)/(x^^+2(A) > p2 - 2)} 



(6.10) 
(6.11) 
(6.12) 

(6.13) 



6.1.2 Risk Performance 



Following lAhmedl ([1997]), we present the risk expressions of the estimators. 

Theorem 6.3. Under the assumed regularity conditions, and local alternative {Kn\, the 
ADQR expressions are as follows: 



R0Y^-, = ahiiWC-^) 
R{Pf; W) = a^tv{WC-^) - ahv{Q) + u'E'^Qu 

R0f;W) = ahriWC-') - {p2 - 2)ahr{Qu) {2E[x;^ 



(6.14) 
(6.15) 



^P2+4(A)] 

-ip2 - 2)E[x;^\^{A)]] + {p2 - 2){p2 + eWiQii 
R0r " " 



■W) = 



) = cjMwC-') - ahriQ)Hp,+2{xl,a,^) 
+ u'B-'u {2Hp,+2ixl,a, A) - Hp,+,{xl,,a, A)} 
R{0f+-W) = R{0f;W 



+ (p2 - 2)ahiiQ) [e {x;,'+2(A)/(x^,+2(A) < 

^(X^2+2(A)< 
\) + u'B-'^Q 
(A)/(; 

{x;,'+4(A)/(x^,+4(A)<P2-2)} 



_7i)i?[x-%(A)] (6.16) 
(6.17) 



< P2 



- ip2 - 2)E {x;^2+2(A)/(x^,+2(A) <P2- 

{P2 - 2; A) + u'B^Qu {2Hp,+4p2 - 2; A)} 
Qu [2E {x-'+2(A)/(x^,+2(A) < P2 - 2)} 
rfv? .J A) < 00-2)1 



-.)} 



— a 

- {p2 - 2)u:'B-'^ 
-2E 



{P2 - 2)E {x;2+4(A)I(x;2+4(A) < 1^2 - 2)} 



(6.18) 
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where Q = HQ- ^WC^H'B 



>-i 



Ahmed (|l997l ) have studied the statistical properties of various shrinkage and pretest 
estimators. It was remarked that none of the unrestricted, restricted, and pretest estimators 
is inadmissible with respect to any of the others. However, at A = 0, 



UR 



/3f ^ /3f ^ $\ 
Therefore, for all (A; W) and P2 ^ 3, 

i?(/3^+; W) < R{0f; W) < R0]'^; W) 



is satisfied. Thus, we conclude that /^J" performs better than /3J;^ in the entire parameter 
space induced by A. The gain in risk over is substantial when A = or near. 



7 Discussion 



In this paper, we reviewed positive-shrinkage and pretest estimation in the context of a 
multiple linear regression model. In our study, we presented asymptotic bias and the risk 
expressions for the estimators. 

When we have prior information about certain covariates, shrinkage estimators are di- 
rectly obtained by combining the full and sub-model estimates. On the other hand, if a 
priori information is not available, shrinkage estimation takes a two-step approach in ob- 
taining the estimates. In the first step, a set of covariates are selected based on a suitable 
model selection criterion such as AIC, BIG or best subset selection. Consequently, the re- 
maining covariates become nuisance, which forms a parametric restriction on the full model. 
In the second step, full and sub-model estimates are combined in a way that minimizes the 
quadratic risk. 

To illustrate the methods, three different data sets have been considered to obtain re- 
stricted, positive shrinkage, and pretest estimators. Average prediction errors based on 
repeated cross validation estimate of the error rates shows that pretest and restricted esti- 
mators have superior risk performance compared to the unrestricted, and positive-shrinkage 
estimators when the underlying model is correctly specified. This is not unusual since the 
restricted estimator dominates all other estimators when the prior information is correct. 
Since the data considered in this study have been interactively analyzed using various model 
selection criteria, it is expected that the sub-models consist of the best subsets of the avail- 
able covariates for the respective data sets. Theoretically, this is equivalent to the case where 
A = 0, or very close to zero. The real data examples, however, do not tell us how sensitive 
are the prediction errors under model misspecification. Therefore, we conduct Monte Carlo 
simulation to study such characteristics for positive-shrinkage and pretest estimators under 
varying A, and different sizes of the nuisance subsets. 

In Monte Carlo study, we numerically computed relative mean squared errors for the 
restricted, positive-shrinkage, and pretest estimators with respect to the unrestricted es- 
timator. Our study re-established the fact that the restricted estimator outperforms the 
unrestricted estimator at or near the pivot (A = 0). However, as we deviate from the 
pivot (A > 0), risk of the restricted estimator becomes unbounded. Pretest estimator be- 
comes unbounded even faster than the restricted estimator for the cases considered in the 
simulation. However, as the A increases, pretest estimator performs better for some A, 
and approaches from below to merge with the line where RMSE is unity. On the other 
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hand, positive-shrinkage estimator decays at the slowest rate with the increase of A, and 
perform steadily throughout a wider range of the alternative parameter subspace. In partic- 
ular, when the nuisance subset is large, positive-shrinkage estimators outperforms all other 
estimators, which can be seen in panels b) and d) in Figure [TJ 



7.1 Future directions 

Pretest estimator either selects restricted or unrestricted estimator depending on the sig- 
nificance based on a test statistic, while positive-shrinkage estimator shrinks the covariates 
towards the restricted subspace. The nuisance subset is ideally a null space when they do 
not contribute anything towards the estimation process. In this sense, shrinkage estimators 
resemble penalized estimators such as the least absolute penalty and selection operator, 
lasso. Proposed by iTibshirani lasso is a member of the penalized least squares 



(PLS) family, which performs variable selection and parameter estimation simultaneously. 
Lasso estimates are obtained via cyclical coordinate descent algorithm. 

Shrinkage estimation does variable selection by shrinking the coefficients towards the 
restricted sub-space. In doing so, some of the coefficients shrink towards zero, while some 
over-shrinks-producing a negative sign for the coefficient. The change of sign may be 
uncomfortable for practitioners, although it does not affect the risk performance. The 
positive-part shrinkage estimator takes care of the negative part by setting the coefficient 
to exactly zero. In the process, most of the coefficients are shrunk while some of them are 
eliminated by shrinking to zero. 

Since the introduction of lasso, there has been a tremendous amount of development in 
lasso and related absolute penalty estimation (APE) during the past one and a half decade. 
Although the lasso and shrinkage methods have been around for q uite some time , little 



work has been done to compare their relative performance. Recently, I Ahmed et al.l (|2007l ) 
compared positive shrinkage and lasso in a partially linear regression setup. However, no 
comparative study for shrinkage and absolute penalty estimators in multiple linear regres- 
sion model has been found in the reviewed literature. We are currently working on this 
front, and the findings will be disseminated through future communications. 
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